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DETAILED ACTION 

Continued Examination Under 37 CFR 1.114 

1 . A request for continued examination under 37 CFR 1.114 was filed in this 
application after a decision by the Board of Patent Appeals and Interferences, but 
before the filing of a Notice of Appeal to the Court of Appeals for the Federal Circuit or 
the commencement of a civil action. Since this application is eligible for continued 
examination under 37 CFR 1.114 and the fee set forth in 37 CFR 1.17(e) has been 
timely paid, the appeal has been withdrawn pursuant to 37 CFR 1.114 and prosecution 
in this application has been reopened pursuant to 37 CFR 1.114. Applicant's 
submission filed on 1 1/8/05 has been entered. 

Claim Rejections - 35 USC §112 

2. The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

3. Claim 1 recites the limitation "audio command" in line 10. There is insufficient 
antecedent basis for this limitation in the claim. 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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5. Claims 1-18 and 20-21 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Inagaki (US Patent No. 5,999,214) in view of Pavlovic et al 
("Integration of audio/visual information for use in human-computer intelligent 
interaction", Image processing, 1997 Proceedings IEEE, pages 121-124) and Cox et al 
(US Patent No. 6,154,723). 

With regard to claim 1. Inagaki teaches a video display device comprising: a 
display configured to display a primary image and a picture-in-picture image 
(PIP) overlaying the primary image (Figure 11, items 13 and 17); and a processor 
operatively coupled to the display and configured to receive a first video data 
stream for the primary image, to receive a second video data stream for the PIP 
(Figure 11, items 22 and 16). 

Inagaki does not teach, "and to change a PIP display characteristic in response 
to a received audio indication and a related gesture from a user, wherein the 
processor is configured to receive the related gesture from the user in response 
to the received audio command". Inagaki apparatus instead detects and 
responds to any of the many sounds or "audio indications" in the form of a unique 
voices of a specific speaking attendees with the same command which is move 
the camera and highlight the PIP of the speaking attendee and does not depend 
on "related gesture from a user" (figure 11 "VOICE DIRECTION DETECTION 
UNIT", column 3, lines 31-33, column 10, lines 16-25). 
However, Pavlovic does demonstrate the concept of a system utilizing a 
combination of "audio commands" and a "related gesture" from a user as a 
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means of controlling a graphical object on display, which is analysis to where 
Inagaki controlled a specific graphical object such as a PIP on a display (see 
Pavlovic page 123 3. Experimental Results section). 

Therefore, it would have been obvious for one ordinary skill in the art at the time 
of the invention to use a "received audio command and related gesture from a 
user", as taught by Pavlovic in the apparatus of Inagaki, because of the 
motivation directly provided by Pavlovic: "Psychological studies, for example, 
show that people prefer to use hand gestures in combination with speech in a 
virtual environment, since they allow the user to interact without special training 
or special apparatus". Pavlovic further teaches that "words or gestures alone can 
be used", therefore, it would have been obvious for one ordinary skill in the art at 
the time of the invention to use words and gestures alternatively, or 
simultaneously, to control the data inputting since it merely depends on the 
user's preference and the type of the application being used. Any levels of 
integration of the voice commands and gesture commands would perform 
equally well in providing input to the computer. Furthermore, it would have been 
obvious matter of design choice to choose whether to enter a voice command 
first, then a gesture command, or in opposite order, since it merely depends on 
the function being performed and the assignments of the commands. For 
example, if movement of the cursor is controlled by gesture commands and 
selection of a menu item is input by voice commands, then whether a voice 
command or a gesture command is needed first would depend on the current 
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position of the cursor: gesture commands first if the user needs to move the 
cursor, but voice commands first if the user wants to select the current 
highlighted menu item (this reads on the limitation of "the processor is configured 
to receive the related gesture from the user in response to the receive audio 
command"). As evidence, Cox teaches a data inputting system for a computer 
using voice commands and gesture commands, wherein some voice commands 
trigger input from gesture commands (column 5 lines 10-19). 
Consider claim 2. Inagaki as modified teaches the method for inputting data to a 
video display device having PIP windows. Therefore, it would have been obvious 
for one ordinary skill in the art at the time of the invention to use the data for 
controlling any parameter changes including size adjustment of the PIP window 
so as to enable simple and precise data inputting for controlling the size 
adjustment of the PIP window. 

With regard to claim 3, Inagaki as modified teaches the video display device of 
claim 1 , comprising 1 , a microphone for receiving the audio indication from the 
user; and a camera for acquiring an image of the user containing the related 
gesture (See Inagaki figure 11). 

With regard to claim 4, Inagaki as modified teaches the video display device of 
claim 1 wherein the processor is configured to analyze audio information 
received from the user to identify when a PIP related audio indication is intended 
by the user (See Inagaki figure 8a and 8b). 
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With regard to claim 5, Inagaki as modified teaches the video display device of 
claim 1, wherein the processor is configured to analyze image information 
received from the user after the audio indication is received to identify the change 
in the PIP display characteristic that is expressed by the received gesture (See 
Inagaki figure 8a and 8b and Pavlovic et al figures 6-8 and especially the 
Pavlovic figure 5 "HIGH LEVEL FEATURE INTEGRATION" where it was obvious 
the pre analyze step is to simultaneously receive the video and audio data using 
the camera and the microphone, where it is then; split into a parallel visual and 
audio estimator/classifier module which is followed by a second stage which 
contains a feature integration/combination module where the combination 
module computes the likelihood of the pairs of gesture and verbal words. This 
claim language is very broad here because Pavlovic clearly receives both the 
audio and video before he analyzes the video or audio data, this is just the logical 
progression claimed). 

With regard to claim 6, Inagaki as modified teaches the video display device of 
claim 5, wherein the image information is contained in a sequence of images and 
wherein the processor is configured to analyze the sequence of images to 
determine the received gesture (since a gesture can be a motion which would 
require a sequence of images to detect this feature is obvious to the system of 
Inagaki and Pavlovic also see Pavlovic section 21 ). With regard to claim 7, the 
combination of Inagaki and Pavlovic teaches the video display device of claim 1 , 
wherein the image information is contained in a sequence of images and wherein 
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the processor is configured to determine the received gesture by analyzing the 
sequence of images and determining a trajectory of a hand of the user (since a 
gesture can be a motion which would require a sequence of images to detect this 
feature is obvious to the system of Inagaki and Pavlovic and is merely viewed as 
directed towards an obvious intended use of which the combination of which it is 
capable also see Pavlovic section 2.1). 

With regard to claim 8, Inagaki as modified teaches the video display device of 
claim 1 , wherein the processor is configured to determine the received gesture 
by analyzing an image of the user and determining a posture of a hand of the 
user (since a gesture can be a posture of a hand this feature is obvious to the 
system of Inagaki and Pavlovic and is merely viewed as directed towards an 
obvious intended use of which the combination of which it is capable also see 
Pavlovic section 2.1 ). 

With regard to claim 9, Inagaki as modified suggest the video display device of 
claim 1 , wherein the video display device is a television (since Pavlovic shows a 
projection screen in figure 6 and since it is also well-known in the prior ad that 
televisions use projection screens one would be motivated to have a projection 
screen with a dual use such as conference and watching the game and is merely 
viewed as directed towards an obvious intended use of which the combination of 
which it is capable) . 

With regard to claim 10, Inagaki as modified teaches the video display device of 
claim 1, wherein the image is a sequence of images of the user containing the 
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user gesture, the video display device comprising a camera for acquiring the 
sequence of images of the user (see Inagaki figure 11, item 2). 
With regard to claims 1-14, most of the limitations was already shown above with 
regards to apparatus claims 1-10 to be obvious and therefore the method claims 
1-14 which corresponds to the apparatus were also obvious and in addition the 
applicant is now specifically claiming', "determining whether the received audio 
indication is one of a plurality of expected audio indications: analyzing a gesture 
of the user if the received audio indication is one of the plurality of expected 
audio indications" (SEE Pavlovic figure 7 where he illustrates a plurality of 
"expected audio indications" SPEECH , and a plurality of "expected gestures" 
GESTURE. Now look at Pavlovic figure 5 where he illustrates in the audio 
estimator/ classifier module receiving and "determining whether the received 
audio indication is one of a plurality of expected audio indications" and where 
also he illustrates in the video estimator/classifier module receiving and 
"determining whether the received gesture is one of a plurality of expected 
gestures" . It is an obvious practice that if either data collection process produces 
an error because the audio indication or gesture used is not from the expected 
sets illustrated in figure 7 that the next step of "analyzing a gesture of the user if 
the received audio indication is one of the plurality of expected audio" in the 
Feature Integrator will not happen. This is because it is an obvious practice when 
an artificial intelligent or smart device as illustrated by the combination of 
Inagaki/pavlovic can not comprehend the data within a reasonable range of 
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certainly or as stated by Pavlovic "computes the likelihood" nothing but waits for 
further inputs.) that it simply errors out in the flow chart and does. 
With regard to claims 15-18 the combination of Inagaki and Pavlovic was shown 
above to read on most of these limitation in claims 1-14 in addition to summarize 
a feature directed towards a program stored implementing this process is 
inherent to the automatic computer system taught by the combination of Inagaki 
and Pavlovic. 

With regard to claim 20, Inagaki as modified was shown above to read on most 
of these limitation in claims 1-18 in addition to summarize a specific feature 
directed towards, "wherein the processor is configured to analyze image 
information received from the user after the audio indication is received to identify 
the change in the PIP display characteristic that is expressed by the received 
gesture" (See Pavlovic figure 5 and specifically the rejection of 1 1 above). 
With regard to claim 21, see the rejection above, note that the device of Inagaki 
as modified is a computer performing data inputting functions, and therefore 
includes the program segments for performing each of the functions. 

Response to Arguments 

6. Applicant's arguments with respect to claims 1-18 and 20-21 have been 

considered but are moot in view of the new ground(s) of rejection. 

As to applicant's main argument with respect to the newly added limitation of 
"wherein the processor is configured to receive the related gesture from the user 
in response to the received audio command", note that Pavlovic does 
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demonstrate the concept of a system utilizing a combination of "audio 
commands" and a "related gesture" from a user as a means of controlling a 
graphical object on display, which is analysis to where Inagaki controlled a 
specific graphical object such as a PIP on a display (see Pavlovic page 123 3. 
Experimental Results section). Therefore, it would have been obvious for one 
ordinary skill in the art at the time of the invention to use a "received audio 
command and related gesture from a user", as taught by Pavlovic in the 
apparatus of Inagaki, because of the motivation directly provided by Pavlovic: 
"Psychological studies, for example, show that people prefer to use hand 
gestures in combination with speech in a virtual environment, since they allow 
the user to interact without special training or special apparatus". Pavlovic 
further teaches that "words or gestures alone can be used", therefore, it would 
have been obvious for one ordinary skill in the art at the time of the invention to 
use words and gestures alternatively, or simultaneously, to control the data 
inputting since it merely depends on the user's preference and the type of the 
application being used. Any levels of integration of the voice commands and 
gesture commands would perform equally well in providing input to the computer. 
Furthermore, it would have been obvious matter of design choice to choose 
whether to enter a voice command first, then a gesture command, or in opposite 
order, since it merely depends on the function being performed and the 
assignments of the commands. For example, if movement of the cursor is 
controlled by gesture commands and selection of a menu item is input by voice 
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commands, then whether a voice command or a gesture command is needed 
first would depend on the current position of the cursor: gesture commands first if 
the user needs to move the cursor, but voice commands first if the user wants to 
select the current highlighted menu item (this reads on the limitation of "the 
processor is configured to receive the related gesture from the user in response 
to the receive audio command"). As evidence, Cox teaches a data inputting 
system for a computer using voice commands and gesture commands, wherein 
some voice commands trigger input from gesture commands (column 5 lines 10- 
19). 

The remainder of the pertinent topics for argument are present in the appropriate 
rejections above. 

Conclusion 

7. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Peters (US Patent No. 6,243,683), Higaki et al (US 2002/0181773), Yamashita et 

al (US Patent No. 6,509,707) are made of record for teach of data input system 

using voice commands and gesture commands. 

CONTACT INFORMATION 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Kent Chang whose telephone number is 571-272-7667. 
The examiner can normally be reached on Monday to Thursday from 9:00 AM to 6:00 
PM. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Sumati Lefkowitz, can be reached at 571-272-3638. 

Any response to this action should be mailed to: 



or faxed to: 

571-273-8300 

Hand-delivered responses should be brought to the Customer Service Window, now 
located at the Randolph Building, 401 Dulany Street, Alexandria, VA 223 14. 
Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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